Genomic data integration using guided clustering
نویسندگان
چکیده
منابع مشابه
Integration Analysis of Diverse Genomic Data Using Multi-clustering Results
In modern data mining applications, clustering algorithms are among the most important approaches, because these algorithms group elements in a dataset according to their similarities, and they do not require any class label information. In recent years, various methods for ensemble selection and clustering result combinations have been designed to optimize clustering results. Moreover, conduct...
متن کاملLabeling Unlabeled Data using Cross-Language Guided Clustering
The effort required to build a classifier for a task in a target language can be significantly reduced by utilizing the knowledge gained during an earlier effort of model building in a source language for a similar task. In this paper, we investigate whether unlabeled data in the target language can be labeled given the availability of labeled data for a similar domain in the source language. W...
متن کاملBiological Data Mining for Genomic Clustering Using Unsupervised Neural Learning
The paper aims at designing a scheme for automatic identification of a species from its genome sequence. A set of 64 three-tuple keywords is first generated using the four types of bases: A, T, C and G. These keywords are searched on N randomly sampled genome sequences, each of a given length (10,000 elements) and the frequency count for each of the 4 = 64 keywords is performed to obtain a DNA-...
متن کاملKernel-based Integration of Genomic Data using Semidefinite Programming
An important challenge in bioinformatics is to leverage different descriptions of the same data set, each capturing different aspects of the data. Many such sources of information [about genes and proteins] are now available, such as sequence,
متن کاملIsofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering
As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the fun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2011
ISSN: 1460-2059,1367-4803
DOI: 10.1093/bioinformatics/btr363